464 research outputs found
A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification
Nearest Neighbors (NN) is one of the most widely used supervised
learning algorithms to classify Gaussian distributed data, but it does not
achieve good results when it is applied to nonlinear manifold distributed data,
especially when a very limited amount of labeled samples are available. In this
paper, we propose a new graph-based NN algorithm which can effectively
handle both Gaussian distributed data and nonlinear manifold distributed data.
To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by
constructing an -level nearest-neighbor strengthened tree over the graph,
and then compute a TRW matrix for similarity measurement purposes. After this,
the nearest neighbors are identified according to the TRW matrix and the class
label of a query point is determined by the sum of all the TRW weights of its
nearest neighbors. To deal with online situations, we also propose a new
algorithm to handle sequential samples based a local neighborhood
reconstruction. Comparison experiments are conducted on both synthetic data
sets and real-world data sets to demonstrate the validity of the proposed new
NN algorithm and its improvements to other version of NN algorithms.
Given the widespread appearance of manifold structures in real-world problems
and the popularity of the traditional NN algorithm, the proposed manifold
version NN shows promising potential for classifying manifold-distributed
data.Comment: 32 pages, 12 figures, 7 table
Folded Polynomial Codes for Coded Distributed -Type Matrix Multiplication
In this paper, due to the important value in practical applications, we
consider the coded distributed matrix multiplication problem of computing
in a distributed computing system with worker nodes and a master
node, where the input matrices and are partitioned into -by-
and -by- blocks of equal-size sub-matrices respectively. For effective
straggler mitigation, we propose a novel computation strategy, named
\emph{folded polynomial code}, which is obtained by modifying the entangled
polynomial codes. Moreover, we characterize a lower bound on the optimal
recovery threshold among all linear computation strategies when the underlying
field is real number field, and our folded polynomial codes can achieve this
bound in the case of . Compared with all known computation strategies for
coded distributed matrix multiplication, our folded polynomial codes outperform
them in terms of recovery threshold, download cost and decoding complexity.Comment: 14 pages, 2 tabl
The Lamb shift in the BTZ spacetime
We study the Lamb shift of a two-level atom arising from its coupling to the
conformal massless scalar field, which satisfies the Dirichlet boundary
conditions, in the Hartle-Hawking vacuum in the BTZ spacetime, and find that
the Lamb shift in the BTZ spacetime is structurally similar to that of a
uniformly accelerated atom near a perfectly reflecting boundary in
(2+1)-dimensional flat spacetime. Our results show that the Lamb shift is
suppressed in the BTZ spacetime as compared to that in the flat spacetime as
long as the transition wavelength of the atom is much larger than radius
of the BTZ spacetime while it can be either suppressed or enhanced if the
transition wavelength of the atom is much less than radius. In contrast,
the Lamb shift is always suppressed very close to the horizon of the BTZ
spacetime and remarkably it reduces to that in the flat spacetime as the
horizon is approached although the local temperature blows up there.Comment: 21 pages,2 figure
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Current speech large language models build upon discrete speech
representations, which can be categorized into semantic tokens and acoustic
tokens. However, existing speech tokens are not specifically designed for
speech language modeling. To assess the suitability of speech tokens for
building speech language models, we established the first benchmark,
SLMTokBench. Our results indicate that neither semantic nor acoustic tokens are
ideal for this purpose. Therefore, we propose SpeechTokenizer, a unified speech
tokenizer for speech large language models. SpeechTokenizer adopts the
Encoder-Decoder architecture with residual vector quantization (RVQ). Unifying
semantic and acoustic tokens, SpeechTokenizer disentangles different aspects of
speech information hierarchically across different RVQ layers. Furthermore, We
construct a Unified Speech Language Model (USLM) leveraging SpeechTokenizer.
Experiments show that SpeechTokenizer performs comparably to EnCodec in speech
reconstruction and demonstrates strong performance on the SLMTokBench
benchmark. Also, USLM outperforms VALL-E in zero-shot Text-to-Speech tasks.
Code and models are available at
https://github.com/ZhangXInFD/SpeechTokenizer/.Comment: SpeechTokenizer project page is
https://0nutation.github.io/SpeechTokenizer.github.io
Classification of C3 and C4 Vegetation Types Using MODIS and ETM+ Blended High Spatio-Temporal Resolution Data
The distribution of C3 and C4 vegetation plays an important role in the global carbon cycle and climate change. Knowledge of the distribution of C3 and C4 vegetation at a high spatial resolution over local or regional scales helps us to understand their ecological functions and climate dependencies. In this study, we classified C3 and C4 vegetation at a high resolution for spatially heterogeneous landscapes. First, we generated a high spatial and temporal land surface reflectance dataset by blending MODIS (Moderate Resolution Imaging Spectroradiometer) and ETM+ (Enhanced Thematic Mapper Plus) data. The blended data exhibited a high correlation (R2 = 0.88) with the satellite derived ETM+ data. The time-series NDVI (Normalized Difference Vegetation Index) data were then generated using the blended high spatio-temporal resolution data to capture the phenological differences between the C3 and C4 vegetation. The time-series NDVI revealed that the C3 vegetation turns green earlier in spring than the C4 vegetation, and senesces later in autumn than the C4 vegetation. C4 vegetation has a higher NDVI value than the C3 vegetation during summer time. Based on the distinguished characteristics, the time-series NDVI was used to extract the C3 and C4 classification features. Five features were selected from the 18 classification features according to the ground investigation data, and subsequently used for the C3 and C4 classification. The overall accuracy of the C3 and C4 vegetation classification was 85.75% with a kappa of 0.725 in our study area
New lower order mixed finite element methods for linear elasticity
New lower order -conforming finite elements for symmetric
tensors are constructed in arbitrary dimension. The space of shape functions is
defined by enriching the symmetric quadratic polynomial space with the
-order normal-normal face bubble space. The reduced counterpart has only
degrees of freedom. In two dimensions, basis functions are
explicitly given in terms of barycentric coordinates. Lower order conforming
finite element elasticity complexes starting from the Bell element, are
developed in two dimensions. These finite elements for symmetric tensors are
applied to devise robust mixed finite element methods for the linear elasticity
problem, which possess the uniform error estimates with respect to the Lam\'{e}
coefficient , and superconvergence for the displacement. Numerical
results are provided to verify the theoretical convergence rates.Comment: 23 pages, 2 figure
Accurate and Efficient Calculation of Three-Dimensional Cost Distance
Cost distance is one of the fundamental functions in geographical information systems (GISs). 3D cost distance function makes the analysis of movement in 3D frictions possible. In this paper, we propose an algorithm and efficient data structures to accurately calculate the cost distance in discrete 3D space. Specifically, Dijkstra’s algorithm is used to calculate the least cost between initial voxels and all the other voxels in 3D space. During the calculation, unnecessary bends along the travel path are constantly corrected to retain the accurate least cost. Our results show that the proposed algorithm can generate true Euclidean distance in homogeneous frictions and can provide more accurate least cost in heterogeneous frictions than that provided by several existing methods. Furthermore, the proposed data structures, i.e., a heap combined with a hash table, significantly improve the algorithm’s efficiency. The algorithm and data structures have been verified via several applications including planning the shortest drone delivery path in an urban environment, generating volumetric viewshed, and calculating the minimum hydraulic resistance
- …